Generalized Random Forests
نویسندگان
چکیده
We propose generalized random forests, a method for non-parametric statistical estimation based on random forests (Breiman, 2001) that can be used to fit any quantity of interest identified as the solution to a set of local moment equations. Following the literature on local maximum likelihood estimation, our method operates at a particular point in covariate space by considering a weighted set of nearby training examples; however, instead of using classical kernel weighting functions that are prone to a strong curse of dimensionality, we use an adaptive weighting function derived from a forest designed to express heterogeneity in the specified quantity of interest. We propose a flexible, computationally efficient algorithm for growing generalized random forests, develop a large sample theory for our method showing that our estimates are consistent and asymptotically Gaussian, and provide an estimator for their asymptotic variance that enables valid confidence intervals. We use our approach to develop new methods for three statistical tasks: non-parametric quantile regression, conditional average partial effect estimation, and heterogeneous treatment effect estimation via instrumental variables. A software implementation, grf for R and C++, is available from CRAN.
منابع مشابه
Using generalized additive models and random forests to model prosodic prominence in German
The perception of prosodic prominence is influenced by different sources like different acoustic cues, linguistic expectations and context. We use a generalized additive model and a random forest to model the perceived prominence on a corpus of spoken German. Both models are able to explain over 80% of the variance. While the random forests give us some insights on the relative importance of th...
متن کاملA generalized inverse for graphs with absorption
We consider weighted, directed graphs with a notion of absorption on the vertices, related to absorbing random walks on graphs. We define a generalized inverse of the graph Laplacian, called the absorption inverse, that reflects both the graph structure as well as the absorption rates on the vertices. Properties of this generalized inverse are presented, including a matrix forest theorem relati...
متن کاملPageRank and random walks on graphs
We examine the relationship between PageRank and several invariants occurring in the study of random walks and electrical networks. We consider a generalized version of hitting time and effective resistance with an additional parameter which controls the ‘speed’ of diffusion. We will establish their connection with PageRank . Through these connections, a combinatorial interpretation of PageRank...
متن کاملRandomer Forests
Random forests (RF) is a popular general purpose classifier that has been shown to outperform many other classifiers on a variety of datasets. The widespread use of random forests can be attributed to several factors, some of which include its excellent empirical performance, scale and unit invariance, robustness to outliers, time and space complexity, and interpretability. While RF has many de...
متن کاملDetecting gene-by-smoking interactions in a genome-wide association study of early-onset coronary heart disease using random forests
BACKGROUND Genome-wide association studies are often limited in their ability to attain their full potential due to the sheer volume of information created. We sought to use the random forest algorithm to identify single-nucleotide polymorphisms (SNPs) that may be involved in gene-by-smoking interactions related to the early-onset of coronary heart disease. METHODS Using data from the Framing...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2017